3,403 research outputs found
A new SVD approach to optimal topic estimation
In the probabilistic topic models, the quantity of interest---a low-rank
matrix consisting of topic vectors---is hidden in the text corpus matrix,
masked by noise, and Singular Value Decomposition (SVD) is a potentially useful
tool for learning such a matrix. However, different rows and columns of the
matrix are usually in very different scales and the connection between this
matrix and the singular vectors of the text corpus matrix are usually
complicated and hard to spell out, so how to use SVD for learning topic models
faces challenges.
We overcome the challenges by introducing a proper Pre-SVD normalization of
the text corpus matrix and a proper column-wise scaling for the matrix of
interest, and by revealing a surprising Post-SVD low-dimensional {\it simplex}
structure. The simplex structure, together with the Pre-SVD normalization and
column-wise scaling, allows us to conveniently reconstruct the matrix of
interest, and motivates a new SVD-based approach to learning topic models.
We show that under the popular probabilistic topic model \citep{hofmann1999},
our method has a faster rate of convergence than existing methods in a wide
variety of cases. In particular, for cases where documents are long or is
much larger than , our method achieves the optimal rate. At the heart of the
proofs is a tight element-wise bound on singular vectors of a multinomially
distributed data matrix, which do not exist in literature and we have to derive
by ourself.
We have applied our method to two data sets, Associated Process (AP) and
Statistics Literature Abstract (SLA), with encouraging results. In particular,
there is a clear simplex structure associated with the SVD of the data
matrices, which largely validates our discovery.Comment: 73 pages, 8 figures, 6 tables; considered two different VH algorithm,
OVH and GVH, and provided theoretical analysis for each algorithm;
re-organized upper bound theory part; added the subsection of comparing error
rate with other existing methods; provided another improved version of error
analysis through Bernstein inequality for martingale
Orthonormal Polynomials on the Unit Circle and Spatially Discrete Painlev\'e II Equation
We consider the polynomials orthonormal with respect to the weight on the unit circle in the complex plane. The leading coefficient
is found to satisfy a difference-differential (spatially discrete)
equation which is further proved to approach a third order differential
equation by double scaling. The third order differential equation is equivalent
to the Painlev\'e II equation. The leading coefficient and second leading
coefficient of can be expressed asymptotically in terms of the
Painlev\'e II function.Comment: 16 page
Riemann-Hilbert approach to multi-time processes; the Airy and the Pearcey case
We prove that matrix Fredholm determinants related to multi-time processes
can be expressed in terms of determinants of integrable kernels \`a la
Its-Izergin-Korepin-Slavnov (IIKS) and hence related to suitable
Riemann-Hilbert problems, thus extending the known results for the single-time
case. We focus on the Airy and Pearcey processes. As an example of applications
we re-deduce a third order PDE, found by Adler and van Moerbeke, for the
two-time Airy process.Comment: 18 pages, 1 figur
Application Testing Under Developer Specified Device Resource Occupancy
During normal usage, consumer devices may remain switched on without a shutdown and restart for long durations of time. A lengthy period of time since the last restart can lead to high usage of device resources such as CPU, memory, storage, etc. Program performance issues as well as errors caused by these are hard to detect using clean functional test environments. This disclosure describes techniques to emulate end-user scenarios as lengthy times since last restart and high resource utilization by providing the developer with the ability to easily configure the usage of the CPU, memory, and storage of a device-under-test (DUT) via a device resources management tool. The device resources management tool is implemented such that it can invoke low level operating system APIs to occupy a specified percentage of resources such as CPU, memory, storage, etc. The extent to which each device resource is occupied can be set in an independent or combined manner. The device resources management tool enables developers to emulate various real world resource utilization scenarios and can help identify bugs that are otherwise rare and/or difficult to reproduce
Virtual devices as a service
Software applications are developed and tested over a large and evolving variety of devices of different device types. Development and testing with physical devices is tedious and time consuming and has scaling and reliability problems. Per techniques of this disclosure, a large pool of virtual devices is instantiated on a compute cluster and made available to software developers as a service. Developers check out as many virtual devices as needed, conduct test and development activity, reset the devices, and release the devices back to the pool. The techniques obviate the need for physical devices and the concomitant issues of cost and reliability and enable large scale testing and development and faster device releases
When corporate scandal hits retail investors close to home
People reduce their participation in the stock market after a case of corporate fraud in their state, write Mariassunta Giannetti and Tracy Yue Wan
- âŠ